Distributed-Memory Breadth-First Search on Massive Graphs
نویسندگان
چکیده
In this chapter, we study the problem of traversing large graphs. A traversal, a systematic method of exploring all the vertices and edges in a graph, can be done in many different orders. A traversal in “breadth-first” order, a breadth-first search (BFS), is important because it serves as a building block for many graph algorithms. Parallel graph algorithms increasingly rely on BFS for exploring all vertices in a graph because depth-first search is inherently sequential. Fast parallel graph algorithms often use BFS, even when the optimal sequential algorithm for solving the same problem relies on depth-first search. Strongly connected component decomposition of a graph [27, 32] is an example of such an computation. Given a distinguished source vertex s, BFS systematically explores the graph G to discover every vertex that is reachable from s. In the worst case, BFS has to explore all of the edges in the connected component that s belongs to in order to reach every vertex in the connected component. A simple level-synchronous traversal that explores all of the outgoing edges of the current frontier (the set of vertices discovered in this level) is therefore considered optimal in the worst-case analysis. This level-synchronous algorithm exposes lots of parallelism for low-diameter (small-world) graphs [35]. Many real-world graphs, such as those representing social interactions and brain anatomy, are known to have small-world characteristics. Parallel BFS on distributed-memory systems is challenging due to its low computational intensity and irregular data access patterns. Recently, a large body of optimization strategies have been designed to improve the performance of parallel BFS in distributed-memory systems. Among those, two major techniques stand out in terms of their success and general applicability. The first one is the direction-optimization by Beamer et al.[4] that optimistically reduces the number of edge examinations by integrating a bottom-up algorithm into the traversal. The second one is the use of two-dimensional (2D) decomposition of the sparse adjacency matrix of the graph [6, 37]. In this article, we build on our prior distributed-memory BFS work [5, 6] by expanding our performance optimizations. We generalize our 2D approach to arbitrary pr × pc rectangular processor grids (which subsumes the 1D cases in its extremes of pr = 1 or pc = 1). We evaluate the effects of in-node multithreading for performance and scalability. We run our experiments on three different platforms: a Cray XE6, a Cray XK7 (without using co-processors), and a Cray XC30. We compare the effects of using a scalable hypersparse representation called Doubly Compressed Sparse Column (DCSC) for storing local subgraphs versus a simpler but less memory-efficient Compressed Sparse Row (CSR) representation. Finally, we validate our implementation by using it to efficiently traverse a real-world graph.
منابع مشابه
Breadth First Search on Massive Graphs
We consider the problem of Breadth First Search (BFS) traversal on massive sparse undirected graphs. Despite the existence of simple linear time algorithms in the RAM model, it was considered non-viable for massive graphs because of the I/O cost it incurs. Munagala and Ranade [29] and later Mehlhorn and Meyer [27] gave efficient algorithms (refered to as MR BFS and MM BFS, respectively) for com...
متن کاملFrom Distributed Memory Cycle Detection to Parallel LTL Model Checking
In [2] we proposed a parallel graph algorithm for detecting cycles in very large directed graphs distributed over a network of workstations. The algorithm employs back-level edges as computed by the breadth first search. In this paper we describe how to turn the algorithm into an explicit state distributed memory LTL model checker by extending it with detection of accepting cycles, counterexamp...
متن کاملGraph partitioning for scalable distributed graph computations
Inter-node communication time constitutes a significant fraction of the execution time of graph algorithms on distributed-memory systems. Global computations on large-scale sparse graphs with skewed degree distributions are particularly challenging to optimize for, as prior work shows that it is difficult to obtain balanced partitions with low edge cuts for these graphs. In this work, we attemp...
متن کاملDistributed Graph Layout for Scalable Small-world Network Analysis
The in-memory graph layout or organization has a considerable impact on the time and energy efficiency of distributed memory graph computations. It affects memory locality, inter-task load balance, communication time, and overall memory utilization. Graph layout could refer to partitioning or replication of vertex and edge arrays, selective replication of data structures that hold meta-data, an...
متن کاملA Note on (Parallel) Depth- and Breadth-First Search by Arc Elimination
This note recapitulates an algorithmic observation for ordered Depth-First Search (DFS) in directed graphs that immediately leads to a parallel algorithm with linear speed-up for a range of processors for non-sparse graphs. The note extends the approach to ordered Breadth-First Search (BFS). With p processors, both DFS and BFS algorithms run in O(m/p + n) time steps on a shared-memory parallel ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1705.04590 شماره
صفحات -
تاریخ انتشار 2015